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We generalized the recently introduced new impurity solve]— based on the diagrammatic expan- 
sion around the atomic limit and Quantum Monte Carlo summation of the diagrams. We present 
generalization to the cluster of impurities, which is at the heart of the cluster Dynamical Mean-Field 
methods, and to realistic multiplet structure of a correlated atom, which will allow a high precision 
study of actinide and lanthanide based compounds with the combination of the Dynamical Mean- 
Field theory and band structure methods. The approach is applied to both, the two dimensional 
Hubbard and t-J model within Cellular Dynamical Mean Field method. The efficient implementa- 
tion of the new algorithm, which we describe in detail, allows us to study coherence of the system at 
low temperature from the underdoped to overdoped regime. We show that the point of maximal su- 
perconducting transition temperature coincides with the point of maximum scattering rate although 
this optimal doped point appears at different electron densities in the two models. The power of the 
method is further demonstrated on the example of the Kondo volume collapse transition in Cerium. 
The valence histogram of the DMFT solution is presented showing the importance of the multiplet 
splitting of the atomic states. 
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I. INTRODUCTION 

One of the most active areas of condensed matter 
theory is the development of new algorithms to sim- 
ulate and predict the behavior of materials exhibiting 
strong correlations. Recent developments of the pow- 
erful many-body approach, the Dynamical Mean Field 
Theory (DMFT)2i2i4 and its cluster extensions^, hold a 
great promise in being able to accurately predict phys- 
ical properties of this challenging class of materials. In 
recent years, the DMFT method has substantially ad- 
vanced our understanding of the physics of the Mott 
transition and demonstrated its power to explain such 
problems as the structural phase diagrams of actinides^i^, 
phonon response^, optical conductivity^, valence and X- 
ray absorptionii and transport 1 ^ of some of the archetype 
materials with strong correlations. 

The success of the Dynamical Mean Field theory in the 
context of the electronic structure revitalized the search 
for fast and flexible impurity solvers which could treat 
Hund's coupling and spin-orbit coupling of the parent 
atomic constituents in the crystal environment of the lat- 
tice. Further, the newly developed cluster extensions of 
DMFT— £ require faster impurity solver which could ac- 
cess low temperature strong correlation limit. 

Many impurity solvers were developed over the 
last few decades, including Hirsh-Fye Quanum Monte 
Carlo MethocU^, exact diagonalization 1 ^, non-crossing 
approximation 1 ^, and its extensions like one-crossing 
approximation^^ or SUNCAil, iterative perturbation 
theory^, Wilson's numerical renormalization group^ ex- 
pansion around the atomic limits, and many others. 

Each method has some advantages and disadvantages, 
but at the present time, there is no method that works 
efficiently and produces accurate solutions for the Greens 



function in all regimes of parameters. Concentrating only 
on the most often employed method, numerically exact 
Hirsch-Fye Quanum Monte Carlo, the following weak- 
nesses limit its usefulness in many realistic materials and 
clusters of strongly correlated models: i) It can not treat 
realistic multiplet structure which is very important in 
strongly correlated / materials, ii) the discretization 
of the imaginary time leads to considerable systematic 
error— and requires extrapolation to mfmitesimally small 
time slices, iii) the low temperatures regime in the strong 
correlation limit of large U is computationally very ex- 
pensive since it requires many time slices and infinite U 
models like t-J model are unaccessible. 

All of the above mentioned shortcomings of the conven- 
tional Quantum Monte Carlo algorithm are eliminated 
by the novel Continuous Time Quantum Monte Carlo 
Method±i2ii22i2I. In addition, the new method is even 
much faster for most of applications we tested, including 
cluster DMFT for the Hubbard model and application of 
LDA+DMFT to the Actinides. 

The basic idea and its implementation for the Hub- 
bard model was recently published in Ref. [H Further 
extensions to more general impurity model was imple- 
mented in Ref. I22I . First demonstration of the power of 
our implementation was presented in Ref. [3 by detail 
low temperature study of sodium doped cobaltates and 
in Ref. [ll| to study Plutonium valence. 

Here we want to describe the powerful new implemen- 
tation of the method for applications to realistic mate- 
rials with complicated multiplet structure, including all 
interaction terms: Hubbard U, Hund's coupling and spin- 
orbit coupling. The extension to clusters of few sites and 
superconducting state within cluster DMFT will be ad- 
dressed and its power will be demonstrated by studying 
the low temperature coherence of both the Hubbard and 
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the t-J model. The Kondo volume collapse transition in 
elemental Cerium will be reexamined showing the valence 
histogram of the alpha and gamma phase of the material. 
More technical details of the implementation are given in 
the Appendix 

II. FORMALISM 

In this section, we will explain the main steps of the 
recently developed Quantum Monte Carlo Method with 
emphasis on the generalization to clusters and multiplcts 
of real materials. 

The cluster Dynamical Mean Field approach can be 
conveniently expressed by the functional of the local 
Green's function^ 

r[G ioc ] = Trlog(Go 1 - S) - Tr[£G] + $[G /oc ]. (1) 

Here G/ oc stands for the Green's function of the cluster 
or single site under consideration and is in general a ma- 
trix of the size equal to the number of sites times the 
number of correlated orbitals per site. Further, Go is the 
non-interacting Green's function Gq 1 = o> + /j + V 2 + V ex t 
and contains all quadratic terms of the Hamiltonian in- 
cluding periodic ionic potential of the crystal (V ex t)- The 
interacting part of the functional ^[Gioc] contains all two 
particle irreducible skeleton diagrams inside the chosen 
cluster, i.e., £ = 5$[Gi oc ]/5G. 

Within the DMFT method, the summation of all the 
diagrams is achieved by solving the corresponding quan- 
tum impurity problem 

Z = j Z)[V>V]e~ Sc ~^ dr/ ° 5dT ' E -' ^M A -'( r < r >^')(2) 

where the Anderson impurity model hybridization A 
term plays the role of the generalized Weiss field that 
needs to be added to the cluster effective action S c to 
obtain the local Green's function. The self-consistency 
condition which determines this Weiss field (hybridiza- 
tion A) is 

G " np = u-E imp -X-A = Gloc = ^ G^kFs (3) 

In the general impurity problem defined in Eq. ([2]) the 
electrons in the cluster S c hybridize with a matrix of 
Weiss fields A Q/ 3 . In the case of single-site DMFT for an 
/ material, this hybridization is a 14 x 14 matrix while in 
cluster DMFT for plaquette, it is 8 x 8 matrix. In some 
cases, hybridization can be block diagonalized as we will 
show in the appendix[A]below on the example of Cellular- 
DMFT for the normal state of the Hubbard model. How- 
ever, in the superconducting state of the same model, 
the hybridization acquires off-diagonal components due 
to anomalous components of self-energy. 



The cluster part of the action S c can be very com- 
plicated and the power of this method is that it can 
treat arbitrary interaction within the cluster. For real 
materials study, the most important on-site terms are 
the Hund's couplings which are usually expressed by the 
Slater integrals 2 ^ (i*2, i*k, Fq in case of / electrons). In 
addition, there is spin-orbit coupling Hso = £ 1 • s and 
crystal field splittings as well as various hoppings and 
non-local interactions within the cluster. 

The continuous time impurity solvers are based on 
the diagrammatic expansion of the partition function 
and stochastic sampling of the diagrams. Two expan- 
sions were recently implemented: the expansion around 
the band limit=S and the expansion in the hybridization 
strength^. The later seems to be superior in the strongly 
correlated regime due to substantial reduction of the size 
of matrices that need to be manipulated^ and, most im- 
portantly, the empirical finding is that the minus sign 
problem in this approach is severely reduced or maybe 
even eliminate d 21 ! 22 . 

The expansion in hybridization strength has a 
long history starting with the famous Non-Crossing 
Approximation^ and various extensions of it such as 
OCAi^, CTMA21, SUNCAii. All these approximations 
can be viewed as partial summation of the same type of 
diagrams. With stochastic sampling, the summation of 
essentially all the diagrams is now possible. The only 
weakness of the approach is that it works exclusively with 
the imaginary time Green's functions and analytic con- 
tinuation to real axis and access to real axis self-energy, 
for example, is still hard to achieved. However, we believe 
that the substantially enhanced precision of the method, 
as compared to Hirsh-Fye QMC, will make this step eas- 
ier. 

The idea of expanding partition function in terms of 
the hybridization with the conduction band, dates back 
to work of G. Yuval and PW. Anderson^. In this work, 
mapping the hybridization expansion to Coulomb gas 
model lead to one of the first breakthroughs in the area 
of the Kondo problem. Generalization to asymmetric 
Anderson model was published by H.D.M. Haldane in 
Ref. |29| . Similar approach was later used in exploring the 
physics of the generalized Hubbard model in the context 
of DMFT—. In this work, the hybridization expansion of 
the partition function was analyzed by renormalization- 
group analysis technique. An early implementation of 
the related idea to sum up the terms of the partition 
function by Monte Carlo sampling was implemented in 
Refl3ll to solve the two two-channel Anderson impurity 
model. 

Sampled Partition Function: In the new Quantum 
Monte Carlo method, the partition function is expanded 
in terms of hybridization strength A and the resulting 
diagrams are summed up by stochastic Metropolis sam- 
pling. Taylor expansion of the impurity partition func- 
tion Eq. (2J gives 



3 



Z = f D [^H]e- s ^h 



by separating the cluster contribution from the bath con- 
I 



It becomes clear that partition function is a product of 
two terms: the average over the cluster states ip and av- 
erage over the bath degrees of freedom A. It was pointed 
out in Ref. [l| that naive sampling of the above diagrams 
would run into a very bad minus sign problem. The rea- 
son is that crossing diagrams (vertex corrections to fa- 
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(4) 
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tribution the partition function can be cast into the form 



(5) 

I 

mous Non-Crossing Approximation to Anderson Impu- 
rity Model) can have either sign and thus weights that 
correspond to crossing diagrams could be negative. The 
ingenious idea proposed in Ref. [l] is to combine all the di- 
agrams of the same order, crossing and non-crossing, into 
a determinant. Mathematically, this can be expressed by 
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A QfcQ < (r fc ,T(.) j 



where Z c = J D[^^p]e Sc and average of the operator 
is {0) c i U ster = -g- J D[if,^ 'tp]e~ ~ s O '. This is the central 
equation of the Continuous Time Monte Carlo sampling 
around the atomic limit. To derive Eq. ([6]) from Eq. ([5]) 
one needs to permute time integration variables r in all 
possible ways. Permutation of fcrmions gives minus sign 
in an odd permutation. This minus sign can be absorbed 
in the minus sign of the product of hybridizations, result- 
ing in the determinant of hybridizations. 

Simulation: The set of diagrams, which are 
associated with the set of imaginary times 
{n , t{ , T2, T2 , • ■ • , Tfc , r' k } and corresponding band 
indexes {cti, Oi' 11 a 2l ot' 2 , • • ■ ,ctk,a k } are visited by 
Monte Carlo (Metropolis) algorithm with the weights 
given by Eq. ©. The effect of the hybridization 
tp(T')tp^ (t)A(t — t') is to create a kink in the time 
evolution of the cluster, i.e., to destroy one electron at 
time t' on the cluster and create another electron at 
some other time r on the cluster. The number of kinks 
is always even due to particle number conservation. 



Two Monte Carlo steps which need to be implemented 
are: i) insertion of two kinks at random times r new and 
T' new (chosen uniformly [0, (3)), corresponding to a ran- 
dom baths a, a' . ii) removal of a two kinks by remov- 
ing one creation operator and one annhilation operator. 
Many other steps can greatly reduce the sampling time, 
for example displacing a randomly chosen operator (ei- 
ther ip or ^ ) t° a new location chosen uniformly [0,/?). 
The double step of inserting or removing two kinks is also 
possible and is relevant when off-diagonal components of 
A are dominant. 

The detailed balance condition requires that the prob- 
ability to insert two kinks at random times r, r', being 
chosen uniformly in the interval [0,/5), is 

■ ( ft Z new D n ew -. /„n 

"add = mm I , 1 (7) 

\k + lj Z id V id 

where Nb is the number of baths, k is the current per- 
turbation order (number of kinks/2), Z new is the cluster 
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matrix clement 



is 



'VV.faJV'Lfa)) cluster(&) 



and "D new /T^ old is the ratio between the new and the old 
determinant of baths A and can be evaluated using usual 
Shermann-Morrison formulas. The factors of (f3Nb) enter 
because of the increase of the phase space when adding 
a kinks (increase of entropy) while the factor l/(fc + 1) 
comes from factorials in Eq. [BJ Similarly, the probability 
to remove two kinks, chosen randomly between [1 • ■ • k] 



P, 



remove 



(3 Nb ) Z id T) id 



(9) 



An important simplification occurs if the hybridization 
is block diagonal. Since A aa / vanishes for some combi- 
nation of act' the determinant in Eq. ([6]) can be written 
as a product of smaller determinants, one for each block 
of hybridization. A specially simple case occurs when all 
the blocks are of size one, the partition function becomes 
a product of Nf, terms, where Nf, is number of all baths 
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The probability to add two new kinks Eq. (J7J) now de- 
pends on the number of kinks of this particular type 
k a (rather than total perturbation order k) and the di- 
mension of the bath subspace Nb becomes unity. In 
general the probability to add two kinks is P a dd = 

2 _ 



k +1 j z -™ t>""Z ' wnerc -^6* i s the number of bands 
which form an off-diagonal sub-block in hybridization A 
and need to be treated simultaneously in one determinant 
V a in Eq. ©. 

The size of hybridization determinants can thus be 
greatly reduced if hybridization A is block diagonal. The 
cluster term Z however cannot be broken into separate 
contributions for each bath, rather the full trace needs to 



be computed numerically. It is therefore essential to find 
a fast way to compute the cluster average (■ • • } cluster in 
Eq. ©. 

Exact Diagonalization of the Cluster: It is crucial to 
evaluate the cluster trace in eigenbase of the cluster 
Hamiltonian and take into account the conservation of 
various quantum numbers, such as the particle number, 
the total spin, and the total momentum of the cluster 
states. 

Typical contribution to the cluster part of the trace 
that needs to be evaluated at each Monte Carlo step, 
takes the form 



Zv = Tr (T T e- d ™^H ai fa)Vd 2 fa) ■ ■ ■ ^ fa)) (11) 

E p -E ml T[, F a 1 \ p -E m2 (T 2 -T[)/ p\a 2 \ . . . ( TP^n-i) p -E mn {r' n _ y -T n ) I jrfa n \ p -E ml (fj-r n ) 

c V Jmim2 t - V )rri2ra^ V Jrri n — im n \ }rn n ra\^- 



where the matrix elements are {F^ ai ) nm = {n\^y a \m) 
and E m are eigenvalues of the cluster. The actual order 
of operators in Eq. (jlip depends on their time arguments 
and creation operator is not necessary followed by annihi- 
lation operator. 

The bottleneck of the approach is that the number of 
cluster states \m) is very large (for example, single site 



DMFT for the / shell requires 2 14 states). Consequently, 
the matrix elements F are in general very large matrices 
and the typical diagram order is inversely proportional 
to temperature (sec Fig. [T]) therefore one typically needs 
to multiply few hundred large matrices at each Monte 
Carlo step. It is clear that this is very impractical and 
the progress can be achieved only by various trick which 
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we have implemented 

• Most of matrix elements vanish. A fast algorithm 
is needed to determine which matrix elements are 
nonzero. 

• Symmetries of the problem can be taken into ac- 
count to reduce the size of the F matrix. 

• The number of trial steps is usually much bigger 
(100 times) than the number of accepted steps and 
the insertion or removal of a kink is very local in 
time operation. It is convenient to store the prod- 
uct Eq. [11] (from both sides, left and right) and 
when trying to insert a new kink, recompute the 
trace only between the inserted times T new and 
t' 

new 

• During simulation, the probability for visiting any 
cluster state can be recorded and can be used in the 
next step to eliminate the irrelevant atomic states 
from the trace in Eq.[TlJ The cluster base can hence 
be adjusted dynamically to describe the particular 
regime studied by the minimum number of relevant 
cluster states. 

To illustrate the method, let us consider a concrete 
example of the cluster of one band model (Hubbard or t- 
J) in the normal state. The bath index a runs over cluster 
momenta q and spin orientation a. The eigenstates of the 
cluster can be written in a form \N,S Z ,K; Sj), where N 
is total number of electrons in the state, S and S z are 
the total spin and its z components, while K is the total 
momentum of the cluster state and 7 stands for the rest 
of the quantum numbers. 

Concept of Superstates: In this base, the matrix el- 
ements of the creation operator are greatly simplified 
V4. ff |A^ 2 ,K;S 7 ) = |iV + l,^ + cr,K + q;S±l/2,7) 
because the creation operator is nonzero only between 
Hilbert subspace of {N, S z , K} and {N+l, S z +a, K+q}. 
It is therefore convenient to group together states with 
the same {N, S Z ,~K} and treat the rest of the quan- 
tum numbers as internal degrees of freedom of a clus- 
ter superstate \i) = \{N, S Z ,~K}}. The superstate \i) is 
multidimensional state with internal quantum numbers 
\m[i]) = I {S, 7)}. It is then clear that creation operator 
acting on a state \i) gives a unique state \ j) = ipq a \i) and 
it is enough to store a single index array F Q t(i) = j to 
figure out how the Hilbert subspaces are visited under 
application of a sequence of creation and annhilation op- 
erators such as in Eq. (fTTj) : io — > F ai (io) — > ■■■ik = 
F' fc " k (ifc-i). This sequence is very often truncated in 
few steps only, because most of the sequences contain 
either multiple application of the same creation or annhi- 
lation operator (Pauli principle) or because they lead to 
a state outside the base (for example ip\N = 0...) = or 
tp^\N = max...) = 0). 

Once the nonzero matrix elements are found by this 
simple index lookup, the value of the matrix element 
needs to be computed by matrix multiplication. By this 



breakup of the Hilbert space and introducing superstates 
\i), the largest matrix which needs to be treated for the t- 
J model on a plaquette is 3 x 3 and for the Hubbard model 
it is 6 x 12, thus substantially smaller than the original 81 
and 265 dimensional Hilbert space. To compute the trace 
in Eq. (jTTJ) we start with unity matrix in each subspace 
of a superstate \i) and apply both the time evolution op- 
erator e~ EmtyTl ~ Tl ^ (by multiplying each row of a matrix 
with its time evolution) and the kink (by multiplication 
with the matrix (F a ) mn or (F a ') mn ). The operation of 
F brings us to the next superstate F a (i) where we repeat 
both the time evolution and the kink application. After 
k steps, the trace of a matrix gives the desired matrix 
element. 

Storing the Time Evolution: The number of kinks is 
proportional to inverse temperature (3 and kinetic en- 
ergy of the system (k) = /3\E kin \ (see Eq. (|52"j)). It thus 
becomes large at low temperatures. However, an inser- 
tion of a kink with large time difference (ijj' (t)i/j(t') with 
large t—t') has a very low probability. The reason is that 
Pauli principle forbids insertion of the pair ^J,(T)^ a (r') 
if another kink of the same species a is between the two 
times (t, t'). At the same time, A(r) is like G(r) peaked 
at small times t — t' and falls of at large times making 
the long time intervals rare. 

The insertion of a kink is thus fairly local in time op- 
eration, therefore it is convenient to store a whole chain 
of products that appear in Eq. (fTTj) from both sides, left 
and right, to make trial step very cheap. It takes only 
few matrix multiplications (almost independent of tem- 
perature) to compute the trace in Eq. ([TT]) . When the 
move is accepted, the trace needs to be updated which 
takes somewhat more time. However, the acceptance rate 
is typically small and on average does not require much 
computational time. 

Adjusting the Cluster Base: Finally, the ultimate 
speedup can be achieved by dynamically adjusting "the 
best" cluster base. The probability for a cluster state \m) 
can be computed during simulation. For a given diagram 
with particular configuration of kinks, the probability for 
a cluster state \m) is proportional to its matrix element 
defined in Eq. (TiTj) . i.e., 

p _ (m\e^ HT ^ a e~ H ^~ T ^ ■ ■ ■ jm) 

2-i{n}(n\e- Hl ~i jl! a e- H ^—ri)...\ n ) 

The sampled average of this quantity gives probability for 
cluster eigenstatc \m). A large number of cluster states 
have very small probability and can be eliminated in sim- 
ulation to ultimately speed up the simulation. It is im- 
portant that the probability for any cluster state depends 
on the particular problem at hand and the program ad- 
justs the base dynamically after a few million of Monte 
Carlo steps. 

Green's junction Evaluation: Like in other impurity 
solvers which are based on the expansion of the hybridiza- 
tion (NCA, OCA, SUNCA), the Green's function is com- 
puted from the bath electron T matrix. Using equation 
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of motion, it is easy to sec that the bath Green's func- 
tion Gkk' is connected to the local Green's function G\ oc 
through the following identity 

G kk ,{T-T')=5 kkl g kk {T-T') (13) 

+ / dT s d,T e g k (T - T e )V k Gi oc (T e - T s )V k >gk>(T s - t') 
Jo Jo 

or summing over momenta 

Q(t -t') = J2 VkG W (r - T>)V k . = A(r - r') 

kk' 

+ / / dT s dr e A(t - T e )G loc {r e - t s )A(t s - t'). (14) 
Jo Jo 

This Green's function Q is equal to the ratio between the 
determinant of A's 



a column at i a and row at i e to matrix M , leads to the 
following relation between the matrix M new and M Q id 



M™ w = M°l d + P L J R i 



(18) 



Here one row and one column of zeros is added to M old 
to match the size of M new . The arrays L and R are given 
by 



where 



L = (Li, ■ ■ ■ , Li e -i, — 1, • • • , Lk) 
R = (Ri, • • • , Ri„-i, —1, • • • , R k ) 
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and p is 

- = A(r - r') - £ A(r - Tj)M°- d A(Ti - r'). (23) 
Finally, the local Green's function becomes 



where one row and one column is added to the bath elec- 
tron determinant. The reason for this simple form is that 
the conduction band is non-interacting and thus obeys 
Wicks theorem. Here we used the definition 



G" 



M" 1 
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V 



(15) 



Using Shermann Morrison formulas to express enlarged 
determinant by the original determinant of A, Q becomes 

G(t - t') = A(r -t')-J2 A (r - njM^ ls A(r ts - r'()16) 



(24) 

It is clear from Eq. that only linear amount of time 
is needed to update the local Green's function. 

When removing two kinks of construction and annihi- 
lation operators at i s and i e , old and new matrix M are 
related by 



Finally, comparing Eq. (|T4j) and (fT6|) we sec that 
Gioc(r e — t s ) = —Mi e ^ a and in imaginary frequency 



M°l d M° l f 
iV1 ij — lvl i 3 M old 



Green's function therefore becomes 



(25) 



1 — ly <'~" M, i Tl/, i e 



(17) 



This equation is the central equation of the approach 
since it relates local green's function with the quantities 
computed in QMC importance sampling. The equation 
shows that only matrix M = (A) -1 needs to be stored 
and manipulated in simulation. 

From the above consideration, it is clear that one could 
also add two rows and two columns to matrix A and 
compute the two particle vertex function in similar way 
without much overhead. 

Fast Updates: The Green's function can be updated 
in linear time (rather than quadratic). When adding a 
construction and annihilation operator at t and r', adding 



(26) 

Finally, the exponential factors e tu}Ti do not need to 
be recomputed at each Monte Carlo step since all "old" 
times can be stored and the exponents need to be com- 
puted only for the new pair of times and only at each 
accepted move. In the present implementation, the algo- 
rithm to sample directly G(icu) is sufficiently fast that it 
does not introduce any performance costs. Since it does 
not introduce systematic error in binning G(t) we believe 
it is superior than the alternative implementations which 
sample G(t). 

Large Frequencies and Moments: Similarly to the 
Hirsch-Fyc QMC, the low frequency points of Green's 
function converge very fast to the exact value while the 
high-frequency points, when sampled directly, contain a 
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lot of noise. It is therefore not very useful to sample 
large frequencies in the above described way. Usually we 
sample 200-300 Matsubara points while the rest are re- 
placed by the high frequency moments of the self-energy 
computed analytically. 

The high frequency moments of the self-energy are 
computed from the Green's function moments, which, in 
general, take the following form 

-ir{{[H,[H,...[H,^ a }...}Ul}) (27) 

To compute the moments within the present approach, 
few operators need to be sampled in simulation. In the 
one-band model, only density is required, but in more 
complicated situation, higher order density-density and 
exchange terms enter. In general, Green's function mo- 
ments can be expressed by the average of a few equal 
time operator O mm ', which take particular simple form 
in the local eigenbase 



,a/3 



G,af3 
l.mm' 



- E„ 



(28) 



™£A = E^WCi^WCi?™ - E n ) 2 (29) 

n 

~^~(E )mn(-^ )nm'(^n ^m') ■ 

The lowest order self-energy moments can then be com- 
puted in the following way 



£ Q(3 (oo) = (mf M3 ) - E, 

=3 = <^> - (mfn 



imp 
G 7 af3\2 



(30) 
(31) 



Sampling of other Quantities: Equal time operator like 
density or those in Eq. (f2"5| and (f2"9"| can be straightfor- 
wardly computed in simulation. To improve the precision 
of simulation, operators are averaged over all times, i.e., 

(0> = 44 f dT Tr[T T e-S;° H ^O{T )e-^ HdT ] (32) 
Z P Jo 

The contribution to (O) of each particular diagram is 
i fc rn+i 



If operator O is one of the good quantum numbers of 
a superstate (one of {N, S* 2 ,K} in our example above), 
matrix O mi ,m 2 is proportional to identity matrix in each 



superstate O mi \ 



2 Oi. Starting with a su- 



perstate i, kinks of a diagram T> generate a sequence of 
superstates i — > j\ = F ai (i) — > j2 = F' a2 (ji) • 
as explained above. The equal time average of cluster 
conserved quantity therefore simplifies to 



i^super states 



1=1 



n+i - ti 

(3 



O, 



(35) 



where Pi is probability for a superstate \i) defined in 
Eq. (|12p and Oj t is the value of the conserved quantity 
in each superstate ji in the interval 77 +1 — 77. This time 
difference is the time each superstate "lives" in the par- 
ticular diagram. 

In this way, the electron density rif and average mag- 
netization S z is computed to very high precision. Similar 
simplification in evaluating the total spin susceptibility, 
defined by 



1 r fi 

XM = - / dr / dr'{S z {T)S z {r')) 
P Jo Jo 



(36) 



where S z is the total spin of the cluster (or atom) leads 
to 



E * 

i£ super states 



1 = 1 



(37) 



where (S z )j is the total spin of the cluster in cluster eigen- 
state j. 

If O does not commute with H c , the integral (|33[) needs 
to be evaluated. In cluster eigenbase it takes the form 



1 



E ( T / e/t )r 



-Em 2 (n+1-Tl) 



-E mi (n + l -T l ) 



lmm\m2 



fl(E mi — E m2 ) 



X(l (rprights 
-^^771^2^^; )m 2 m 



(38) 



-Xeft -E{r -n) 



O(r )e 



-E(n + 1 -T )rpright 



k-l 



1=1 



(33) 

where Zx> is the cluster part of the trace Eq. (fTT]) . T 1 ^ 1 
are the operators in Eq. (fTTj) which appear before To and 
j,rig/it con ^- ams p era tor with time arguments greater 
than To- k is the number of kinks in the diagram T>. 
As we mentioned before, these time evolution operators 
(T. , x[ l9ht ) are stored and regularly updated during 
simulation. 

Further simplification is possible, if operator (O) com- 
mutes with H, 



Sampling the total energy: The average of the potential 
energy (V ) can be computed from the average of the local 
energy since 

afiyS a/3 



= (V) + Tr[E imp n] 



(39) 



cluster 



(0) v 



ti+1 - n 

(3 



j> v ^ e ft e -E(T l + 1 ~Ti) Qrpright-^ 



Here we concentrate on the case of general single site 
DMFT rather than the cluster extensions. The reason is 
that kinetic energy in cluster extensions depends on the 
pcriodization scheme and we will not go into this details 
here. Kinetic energy of the general single site DMFT 
Ekin = Tr[if£Gk] can be computed by 



(34) 



E km = Tr[AG] + Tr[( M + E imp )n} 



(40) 
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The total energy is therefore given by 

(H) = (H loc ) + Tr[AG] + (in (41) 

The first term Hi oc can be computed very precisely 
in simulation. The sampled quantity (O) is just the 
energy of an atomic state and can be simply ob- 
tained from the probabilities for atomic states (Hi oc ) = 
Y.meaii-states P inE m . _ Computing kinetic energy from 
the Green's function gives in general worse accuracy be- 
cause the high-frequency behavior of the Green's func- 
tion can not be directly sampled and augmentation with 
analytically computed tails is necessary. However, it is 
simple to show that the average value of the perturbation 
order is related to the average of the kinetic energy 

(k) = ~Tr[AG\ (42) 

where (k) is the average perturbation order and T is tem- 
perature. The later quantity is directly sampled in the 
present algorithm and it is just the center of gravity of 
the histogram (presented in Fig. []}. Finally, the total 
energy E is given by 

E tot = (H local )-T(k) + fm (43) 

All quantities in this equation can be computed to very 
high accuracy and since low temperatures can be reach- 
able in this method, the entropy can be obtained by in- 
tegrating the specific heat. 

Superconductivity: The power of the method can be 
further demonstrated by studying the superconducting 
state of the strongly correlated systems at low tempera- 
ture with essentially no performance cost. By employing 



Here Yj, are spheric harmonics, C/™^ arc Clebsch- 
Gordan coefficients and F k are Slater integrals. The F°, 
usual Hubbard U, is commonly computed by constrained 
LDA, while the rest of the Slater integrals (F 2 , F 4 and 
F 6 ) are computed using atomic physics program^. Fi- 
nally, the spin-orbit strength £ is computed within LDA 
program and needs to be updated during self-consistent 
LDA+DMFT calculation. 

Exact diagonalization of the atomic Hamiltonian leads 
to eigenstates with conserved number of particles N, to- 
tal angular momentum J and its z component \NJ Z ; J7). 

For the one electron base, it is more convenient to chose 



the Nambu formalism, the translationally invariant clus- 
ter methods (for details on Ccllular-DMFT on a plaque- 
tte see Appcndix[Aj results in N c two dimensional baths, 
where N c is the number of cluster momenta. Namely, 
the baths {K,f} and {— K, [} arc coupled through the 
anomalous component of hybridization and require si- 
multaneous treatment in Det(A) in Eq. QlOj) . The deter- 
minants are on average twice as large as in normal state, 
however, the cluster part of the trace in Eq. (JTOj) remains 
unchanged. Since most of the time is usually spend in 
evaluating the local part of Z, the performance is not 
noticeably degraded in superconducting state. In typi- 
cal run presented below, the histogram is peaked around 
k = 250 — 500 which is equal to the order of a typical di- 
agram. In the translational invariant representation em- 
ployed here, the size of a typical determinant in Eq. (JTUJ) 
is only k/N c ~ 60 — 120 and using fast-update scheme 
presented above, the trace over the bath states is not 
expensive part of the algorithm. 

Hund's coupling and spin-orbit coupling: In materi- 
als with open / orbitals, the multiplet effects are very 
strong and SU (N) approximation is not adequate. Si- 
multaneous inclusion of Hund's coupling and spin-orbit 
coupling in DMFT method is crucial for quantitative de- 
scription of Actinidesii. Minimal local Hamiltonian for 
Lanthanide and Actinide materials is 

Hatom = Hjiubbard+Hunds + Hso + E imp fl (44) 

Here Ei mp is the impurity level without the spin-orbit 
coupling since later is included explicitely. The Hund's 
coupling and spin-orbit coupling take the following form 



(45) 
(46) 

I 

the electron total angular momentum (j, j z ) rather than 
(l z , a) since the construction and annhilation operators 
conserve z component of total spin 

^ \N J z ; J 7 > = \N + 1, J z + j z ; J'V) ■ (47) 

Superstates \i) are chosen in a way to simplify the matrix 
elements of the creation operator %p\ In this 
convenient choice is 

\i) = \{NJ Z }). (48) 

In the absence of crystal field splitting, the hybridization 
A is a diagonal matrix (all 14 baths are one-dimensional). 



1 47rF* } 



H Hubbard+Hunds — ^ ^ ^ ^ 2k -\- 1 ^ I ) (^'d I ^Aim \^L b ) f\ Ja(T f\ Jh( j' f L d a' f L c a 

HSO = £ + 1) - 1(1 + 1) - ^CfZCZ^fLjlrn'*' 



jrrij ,lmm' ,<ra ' 
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However, strong crystal fields splitting generates off- 
diagonal components of &{jj z }{j'f } and in general, the 
determinant Det(A) in Eq. © can not be broken into 
small determinants as in Eq. (flTjj) . 

III. RESULTS 

We implemented Cellular DMFT for both the Hubbard 
and the t-J model on a plaquette. The energy scales are 
given in units of t, Hubbard U is fixed at 12U while 
J of the t-J model is set to J = 0.3. We simulated 
5.000.000 Monte Carlo steps per processor and results 
were averaged over 64 processors. One DMFT step for 
the Hubbard model takes approximately 45 minutes on 
1.7 GHz PC processor therefore each Monte Carlo step 
requires on average one million clock-ticks (0.9 Mflops). 




100 200 300 400 500 

perturbation order 



FIG. 1: The perturbation order histogram shows the distri- 
bution of the typical perturbation order of the diagrams in 
the simulation. The histogram is peaked around the typical 
order, which is related to temperature and kinetic energy by 
(k) = \E km \/T. 

Fig. [1] shows histograms (probability distribution for 
the perturbation order k) for few dopings of the Hubbard 
model 5 = 1—n at T = l/200£. The average perturbation 
order is increasing with doping since the electrons are 
getting more delocalized (absolute value of the kinetic 
energy is increasing) and the creation of kinks becomes 
less expensive. It is of the order of 450 in overdoped 
regime, but the typical size of the determinant is only 
112 in superconducting state or 56 in normal state. 

We recently addressed the problem of coherence scale 
in the t-J model^ and we found, using NCA as the im- 
purity solver, that the imaginary part of the cluster self- 
energy X(7r,o) i which plays the crucial role in the optical 
conductivity and transport, becomes very large at opti- 
mal doping and consequently the coherence scale vanishes 
around optimal doping. Here we extend this study to the 
Hubbard model using much lower temperature. We will 
show that the system become strongly incoherent at opti- 
mal doping and the maximum of T c tracks the maximum 
scattering rate in both the Hubbard and the t-J model. 



Fig. [2] shows the imaginary part of the self-energy 
S( 7r )( l ' w ) a t few different dopings and temperature T = 
O.Oli which is around the superconducting critical tem- 
perature of this approach. The system is still in normal 
state. It is clear that the self-energy at large frequencies 
is a monotonic function of doping and is largest at the 
Mott transition (5 = 0. However, the low frequency region 
is distinctly different and the crossing of self-energies is 
observed around uu n ~ O.li. Low doping as well as large 
doping self-energies can be extrapolated to zero, while at 
optimal doping self-energy remains of the order of unity 
even around the critical temperature. 



T=0.01 t 




FIG. 2: The imaginary part of the cluster self-energy 2(^,0) 
on imaginary axis. Temperature T = O.Olr. is around the crit- 
ical temperature but still in normal state. At low frequency, 
the most incoherent self-energy corresponds to optimal doped 
and not to the underdoped system. 



T=0.005t t -j T=0.005t Hubbard 




FIG. 3: The upper panel shows the cluster £(,r,o) self-energy 
on the imaginary axis for both the t-J (left) and Hubbard 
(right) model showing the large scattering rate at optimal 
doping. The lower panel shows the anomalous self-energy for 
the same models and doping levels and can be used to locate 
the optimal doped regime. 

The upper panels of Figure[3]show the same self-energy 
at small frequency for both the t-J (left) and the Hubbard 
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(right) model. In this figure, temperature is T = 0.005t 
and is far below T c therefore all curves extrapolate to 
zero and the system becomes coherent in superconduct- 
ing state. However, the reminiscent of the strong in- 
coherence at optimal doping is the large slope of imagi- 
nary part of £(z<x>) which induces very small quasiparticle 
residue in this regime. 

The precise position of the optimal doping, character- 
ized by the largest low frequency anomalous self-energy, 
is different in the t-J and in the Hubbard model (See 
lower panel of Fig. [3]). It is around S = 0.16 in the for- 
mer and around S = 0.1 in the later. However, the large 
scattering rate is ultimately connected with the largest 
anomalous self-energy and hence largest critical temper- 
ature in both the t-J and the Hubbard model. 

In our view, the most important advantage of the new 
Quantum Monte Carlo method is that it can treat the 
realistic multiplet structure of the atom. It was shown in 
Ref. [n| that Hubbard term only (Fq), leads to sever un- 
derestimation of the interaction strength in Actinides and 
for realistic values of Fg, DMFT predicts heavy fermion 
state in Curium rather than magnetic state. Negligence 
of the multiplet structure of Plutonium misses the fine 
structure of the quasiparticles (two peaks around 0.5 and 
0.85 eV) and more importantly, predicts only weakly cor- 
related metallic state in delta phase of Plutonium. 

To demonstrate the advantage of the new method, 
we recomputed the localization-delocalization transition 
in the archetype material exhibiting Kondo collapse, 
namely the a — > 7 transition of elemental Cerium. Since 
the number of electrons in Cerium fluctuates between 
states with zero, one and two electrons in the /-shell, the 
number of atomic states that needs to be kept is rela- 
tively small hence solving the impurity problem requires 
very little computational power in this case. In Fig. [4] 
we show the "valence histogram"— of the two phases of 
Cerium, i.e., the projection of the density matrix to the 
eigenstates of the atom. The plot shows the probability 
to find an / electron of Cerium in any of the atomic eigen- 
states and demonstrates how strongly is the atom fluc- 
tuating between atomic states. The typical fluctuating 
time is inversely proportional to the Kondo temperature 
of the phase, being around 2000 A' in the alpha phase 
and around 80 K in the gamma phase. The itinerant al- 
pha phase histogram is peaked for many atomic states, 
including the spin-orbit split 5/2 and 7/2 singly occupied 
states as well as the empty state. On the other hand, the 
local-moment gamma phase is peaked only at the ground 
state of the singly occupied sector with 5/2 spin show- 
ing that the DMFT ground state closely resembles the 
atomic N = 1 ground state. 



IV. CONCLUSION 

We generalized the recently developed continuous time 
Quantum Monte Carlo expansion around the atomic 
limiti to clusters treated within Cellular DMFT or Dy- 
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FIG. 4: Projection of the DMFT ground state of alpha and 
gamma Cerium to various atomic configurations of Cerium 
atom. The histograms describe the generalized concept of va- 
lence, where the / electron in the solid spends appreciable 
time in a few atomic configurations. The height of the peak 
corresponds to the fraction of the time the / electron of the 
solid spends in one of the eigenstates of the atom, denoted by 
the total spin J of the atom. We summed up the probabil- 
ities for the atomic states which differ only in the z compo- 
nent of the total spin J z . The x axis indicates the energy of 
atomic eigenstates in the following way: Energy(N — 1, J) = 
E at0 m{N, ground— state) — E a tom(N — 1, J) and Energy(N+ 
1, J) = E a tom{N + 1, J) - E at0 m(N, ground — state), where 
N is between and 2. 



namical Cluster Approximation as well as to materials 
that require realistic Hund's coupling and multiplet split- 
ting of the atomic state. We explained the steps neces- 
sary for the efficient implementation of the method. Our 
low temperature data in the strongly correlated regime 
of the Hubbard and t-J model show the efficiency of the 
method and demonstrate its superiority compared to the 
conventional Hirsh-Fye Quantum Monte Carlo Method. 
The long-standing problem of adequate treatment of the 
multiplet splitting within DMFT is resolved. This split- 
ting is crucial in actinides and its omission can lead to 
wrong prediction of the magnetic nature of the DMFT 
ground state solution. 

We showed that the optimal doped regime in both the 
t-J and the Hubbard model is characterized by the largest 
scattering rate and the system becomes more coherent 
in the underdoped and overdoped regime. The precise 
position of the optimal doping, as determined from the 
maximum in anomalous self-energy, is different in the two 
models. However, the strong incoherence is always found 
at the doping corresponding to maximum Tq- 

We computed the valence histogram across the Cerium 
alpha to gamma transition with emphasis on the multi- 
plet splitting of the atomic states. We showed that the 
atom fluctuates between many atomic states in itiner- 
ant alpha phase and both the 5/2 and 7/2 spin-orbit 
split states have large probability in the ground state of 
the system. The empty state and the doubly occupied 
states, which are substantially split by Hund's coupling, 
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acquire a finite probability. The gamma phase, on the 
other hand, shows well defined valence nf ~ 1 and charge 
fluctuations become rare. 
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APPENDIX A: CELLULAR DMFT IN 
DIAGONAL REPRESENTATION 

For the present Continuous Time Monte Carlo method, 
it is convenient to pick the base such that the local quan- 
tities are diagonal. Cellular DMFT uses a generalized 
open boundary conditions and therefore, in general, mo- 



mentum is not a good quantum number. However, for 
small clusters, such as plaquette, all sites are equivalent 
and translational invariancc is still obeyed. The local 
Green's function and hybridization in momentum base 
take the diagonal form 



/ G 



G 



cluster 



(0,0) 








o \ 





G(k,0) 














G(0,7r) 

















(Al) 



In superconducting state Gk is a 2 x 2 matrix in Nambu 
notation. Although the local quantities are diagonal, 
non-interacting Hamiltonian at general k-points is not. 
The self-consistency condition in superconducting state 
takes the following form 
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where we defined 



= t' sin k x sin k y 

= sin k x (t + t' cos k y ) 



Vq 

Vl 

v 2 = sin k y (t + t' cos k x ) 
v 3 = sin k x {t — t' cos k y ) 
= sin k v (t — t' cos k y ) 

sky) - t'{l 



(A3) 



?;4 = sin k y (t — t' cos k y ) 

€q = — 1{2 + cos k x + cos k y ) —1/(1 + cos k x cos k y ) 
ei = t(cos k x — cos k y ) + t'(l + cos k x cos k y ) 
e 2 = —t(cosk x — cos k y ) + t'(l + cosk x cos k y ) 



£3 



and assumed 



— cos k y ) 

cos k x cos k y 
i(2 + cos k x + cos k y ) — t'(l + cos k x cos k y 
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Here En is the normal on-site self-energy, £12 is the 
nearest-neighbor and £13 is the next-nearest neighbor 
sclf-cncrgy and <^>,'s are the anomalous components of 
the self-energy. For d-wave symmetry, </>q and 4> 3 vanish 
and 4>\ = — </>2- 



The advantage of this formulation of the Cellular 
DMFT is that the hybridization becomes block diagonal 
and hence the determinants (DetA) which enter Eq. © 
can be broken up into separate contribution for each mo- 
mentum K point like in Eq. (|10p . 
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